ScienceDirect 


Journal of Power Sources 175 (2008) 419-429 


Available online at www.sciencedirect.com 


JOURNAL OF 


www.elsevier.com/locate /jpowsour 


On-line fault diagnostic system for proton exchange 
membrane fuel cells 


Luis Alberto M. Riascos**, Marcelo G. Simoes, Paulo E. Miyagi ° 


è Federal University of ABC, r. Santa Adelia 166, CEP 09210-170, Santo Andre, Sao Paulo, Brazil 
b Colorado School of Mines, 1500 Illinois St, 80401 Golden, CO, USA 
€ Escola Politecnica, University of Sao Paulo, Av. Prof. Mello Moraes 2231, CEP 05508-900, Sao Paulo, Brazil 


Received 3 July 2007; received in revised form 2 September 2007; accepted 3 September 2007 
Available online 14 September 2007 


Abstract 


In this paper, a supervisor system, able to diagnose different types of faults during the operation of a proton exchange membrane fuel cell 
is introduced. The diagnosis is developed by applying Bayesian networks, which qualify and quantify the cause-effect relationship among the 
variables of the process. The fault diagnosis is based on the on-line monitoring of variables easy to measure in the machine such as voltage, 
electric current, and temperature. The equipment is a fuel cell system which can operate even when a fault occurs. The fault effects are based on 
experiments on the fault tolerant fuel cell, which are reproduced in a fuel cell model. A database of fault records is constructed from the fuel cell 
model, improving the generation time and avoiding permanent damage to the equipment. 


© 2007 Elsevier B.V. All rights reserved. 
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1. Introduction 


Major efforts to reduce greenhouse gas emissions have 
increased the demand for pollution-free energy sources, in the 
last few years. Governmental and private-sector investments in 
R&D, to support a program for clean energy generation includ- 
ing hydrogen-based, are under way. 

Fuel cells are electrochemical devices that generate electric- 
ity, similar to batteries but which can be continuously fueled. 
Most recent developments in proton exchange membrane fuel 
cell (PEMFC) technology have made it the most promising for 
stationary and mobile applications in the range of up to 200 kW. 
They are characterized by high efficiency, high power density, 
no aggression to the environment, no moving parts, and superior 
reliability and durability. 

Under certain pressure, hydrogen (H2) is supplied into a 
porous conductive electrode (the anode). The H» spreads through 
the electrode until it reaches the catalytic layer of the anode, 
where it reacts, separating protons and electrons. The H* pro- 
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tons flow through the electrolyte (a solid membrane), and the 
electrons pass through an external electrical circuit, producing 
electrical energy. On the other side of the cell, the oxygen (O2) 
spreads through the cathode and reaches its catalytic layer; on 
this layer, the O2, H* protons, and electrons produce liquid water 
and residual heat as sub-products [1]. 

Several papers have been published considering the fuel cell 
(FC) operation in normal conditions; but few of them addressed 
the FC operation under fault conditions. Faults are events that 
cannot be ignored in any real machine, and their consideration is 
essential for improving the operability, flexibility, and autonomy 
of automatic equipment. 

In this paper, a fault diagnostic supervisor was designed to 
execute on-line diagnosis, which indicates the cause of an incipi- 
ent fault. The supervisor uses a Bayesian network arrangement to 
establish the cause-effect relationship, and to calculate the prob- 
ability of the most likely fault cause. An early alert of an incipient 
fault allows making decisions to avoid degradation of other com- 
ponents and catastrophic faults in the equipment. A FC model 
able to reproduce the effects of faults on a fuel cell is gener- 
ated. The supervisor and the FC model were integrated using the 
MatLab/SimuLink® environment to confirm the characteristics 
of this interaction. 
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This paper is organized as follows. In Section 2, the mon- 
itoring of the fault tolerant fuel cell (FTFC) is presented. In 
Section 3, the FC model is introduced. In Section 4, four types 
of faults in PEMFC are considered: faults in the air fan, faults in 
the refrigeration system, growth of the fuel crossover, and faults 
in hydrogen pressure. Section 5 presents a short background 
of Bayesian networks. Section 6 introduces the fault diagnostic 
supervisor for PEMFC. 


2. The fault tolerant fuel cell (FTFC) 


The design of a fault diagnostic supervisor requires the anal- 
ysis of the operation of a FC in fault conditions; a FTFC was 
constructed at the PSERC laboratory of the Colorado School 
of Mines (CSM) [2]. The control system, the sensor system, 
and the power system compose the FTFC. The control system 
allows the adjustment of the speed of the air-reaction blower 
and the refrigeration blower. The sensor system allows moni- 
toring the voltage (Vs), electric current (Ipc), temperature, and 
relative humidity (HRout). The power system is composed by 
one AvistaLabs cartridge containing four proton exchange mem- 
branes (PEM). Also, the control of the FTFC can be executed by 
microcontrollers (inside the FTFC) or based on PC (using the 
LabView®). The same LabView® is applied for monitoring the 
variables and the speed of the blowers. The air for reaction and 
the air for refrigeration are separated on different routes, which 
simplifies the monitoring process of some variables. 

The FTFC allows the operation (and the monitoring) of the 
system even when faults occur. Fig. 1 illustrates the monitor- 
ing of the FTFC; this figure shows the FTFC, the load, and a 
desktop computer with the LabView® software executing the 
monitoring. 
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Fig. 1. Monitoring the FTFC. 


Fig. 2 illustrates the evolution of several variables such 
as output stack voltage (Vs), electric current (Ipc), tempera- 
ture, relative humidity (HRout), and airflow volume, using the 
software LabView®, when the FTFC operates in normal condi- 
tions. 

The FTFC was tested in different fault conditions. Fig. 3 illus- 
trates the evolution of the output voltage (Vs), electric current 
(Ifc), and relative humidity (HRout) when the H2 pressure is 
reduced at t= 10 min. 

Fig. 4 illustrates the evolution of the output voltage (Vs), 
electric current (Ipc), and relative humidity (HRout) when the 
air-reaction volume is reduced at t= 30 min. 

Unfortunately, the generation of each case requires about 
2h of supervised experiments; therefore, the construction of a 


EAEG 


Fig. 2. Evolution of the FTFC’s variables in normal conditions. 
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Fig. 3. Evolution of FTFC’s variables by reduction of H2 pressure. 


database with a considerable number of cases became highly 
time-consuming. Also, fault effects such as membrane breaking 
or dying of membrane imply permanent damage to the FTFC. 
The effects of different types of faults can be simulated adapting 
a FC model, avoiding damage to the equipment and improving 
the generating time of fault records. 
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Fig. 4. Evolution of FTFC’s variables by reduction of air-reaction volume. 


3. The fuel cell model 


Several mathematical models of PEMFC can be found in the 
literature [1,3-5]. Basically, a model of PEMFC consists of an 
electro-chemical and thermo-dynamical parts. Correa et al. [3] 
introduce an electro-chemical model of a PEMFC; to validate 
this model, the polarization curve obtained with this model is 
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compared to the polarization curve of the manufacturing data 
sheet. In Ref. [6], the thermo-dynamical part of the model and 
the effects of different types of faults are included. 

The FC model is based on the calculation of voltage, tem- 
perature, and humidity, according to the equations considered 
in Ref. [1,3]. The voltage Vgc of a single cell can be defined as 
the result of the following expression [1]: 


Vec = ENernst 7: Vact E Vohmic = Veon (1) 


ENemst is the thermodynamic potential of the cell representing 
its reversible voltage: 


EņNemst = 1.229 — 0.85 x 107°(T — 298.15) 
+4.31 x 10757 [in (Pa) + 41n (Po;) (2) 


where Py, and Po, (atm) are the hydrogen and oxygen pressures, 
respectively and T (K) is the operating temperature. Vact is the 
voltage drop due to the activation of the anode and the cathode: 


Vact = —[61 + é2T + &3T In(co,) + &4T In Uc)] (3) 


where &; (i=1...4) are specific coefficients for every type of 
FC, Ipc (A) is electric current, and co, (atm) is the oxygen 
concentration. 

Vohmic is the ohmic voltage drop associated with the conduc- 
tion of protons through the solid electrolyte, and of electrons 
through the internal electronic resistance: 


Vonmic = Ipc(Ru + Rc) (4) 


where Rc (Q) is the contact resistance to electron flow and RM 
(Q) is the resistance to proton transfer through the membrane: 


£ 
: 181.6[1 + 0.03Upc/A) + 0.062(T/303)? (Ipc /A)25] 
M — 


[y — 0.634 — 3(Ifc/A)] exp[4.18((T — 303)/T)] 
(5) 


where pm (Q cm) is the specific resistivity of membrane, £ (cm) 
the thickness of membrane, A (cm?) the active area of the mem- 
brane, and y is a coefficient for every type of membrane. 

Von represents the voltage drop resulting from the mass trans- 
portation effects, which affects the concentration of the reacting 


gases: 
J 
(6) 
Jmax ) 


where B (V) is a constant depending on the type of FC, Jmax the 
maximum electric current density, and J is the electric current 
density produced by the cell. In general, J = Jout +Jn where Jout 
is the real electrical output current density and Jn is the fuel 
crossover and internal loss current. 

Considering a stack composed by several FCs, and as initial 
approximation, the output stack voltage can be considered as: 
Vstack = nrVpc, where nr is the number of cells composing the 
stack. However, constructive characteristics of the stack such as 
flow distribution and heat transfer should be taken [7—11]. 


Veon = —Bln (1 — 


The variation of temperature in the FC is obtained with the 
following differential equation [1]: 
dT AQ 


d MC, 


(7) 


where M (kg) is the whole stack mass, C, (J kK! kg!) the aver- 
age specific heat coefficient of the stack, and AQ is the rate of 
heat variation (i.e. the difference between the rate of heat gen- 
erated by the cell operation and the rate of heat removed). Four 
types of heat can be removed: heat by the reaction air flowing 
inside the stack (Qrem1), by the refrigeration system (Qyem2), 
by water evaporation (Qyem3), and by heat exchanged with the 
surroundings (Qrem4). 

Water forms at the cathode, and because the membrane elec- 
trolyte is very thin, water would diffuse from the cathode to 
the anode during the operation of the cell. The water formation 
would keep the electrolyte hydrated. This level of hydration is 
measured through the relative humidity of the output air. 

To calculate the relative humidity of the output air, the balance 
of water is establishes: output = input + internal generation, or in 
terms of the partial pressure of water: Py,,, = Pw;, + Pw 

And, also HRoutPsat_out = Pwo,» then the HRout is 
HRou = Pwin + Pozen (8) 


P, sat_out 


gen’ 


where Py,, is the partial pressure of the water in the inlet air, 
Pwgen the partial pressure of the water generated by the chemical 
reaction, and Psat out is the saturated vapor pressure in the output 
air. 


The Psat is calculated from the following equation: 
(b/T) +c 
10 


If T> 273.15 K, then a = —4.9283, b= —6763.28, and c =54.22; 
The rate of water production (kg s~!) is calculated from the 
next equation [1]. 


Psat = T° exp 


ńnmo = 9.34 x 1078 Ipc nr 


For normal operation of the FC, proper temperature and 
humidity should be maintained. If the HRout is much less 
than 100%, then the membrane dries out and the conductiv- 
ity decreases. On the other hand, a HRouw greater than 100% 
produces accumulation of liquid water on the electrodes, which 
become flooded and block the pores, making gas diffusion diffi- 
cult. The result of these two conditions is a fairly narrow range 
of normal operating conditions. In abnormal conditions such 
as flooding or drying, parameters (such as Rc and wy) that are 
normally constant (Table 1) start to vary. The parameters of the 
FTFC model for normal conditions are presented in Table 1. 

In general, these parameters are based on manufacturing data 
and laboratory experiments, and their accuracy can affect the 
simulation results. In Ref. [12], a multi-parametric sensitivity 
analysis is performed to define the importance of the accuracy 
of each parameter. The accuracy was analyzed in normal con- 
ditions, considering variations around +10% of their normal 
values. However, in fault conditions, those variations can be 
stronger, as presented in Sections 4.1—4.4. 
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Table 1 

Parameters of the FTFC 

Parameter Value 

nr 4 

A (cm?) 62.5 

£ (um) 25 

Po, (atm) 0.2095 

Py, (atm) 1.47628 

Rc (Q) 0.003 

B (V) 0.015 

& —0.948 

i 0.00286 + 0.000 2In A + (4.3 x 107>)In cH, 
&3 7.22 x 1075 

&4 —1.06153 x 1074 
y 23.0 

Jn (A cm?) 0.022 

Jmáx (A cm?) 0.672 


In Ref. [13], the water and thermal management in fuel 
cell systems were analyzed considering extra humidification 
at the cathode and anode. Forms of extra humidification can 
include liquid water injection, direct membrane humidification, 
recycling-humidification and many other methods; in Ref. [14], 
the parameters that affect the liquid water flux through the mem- 
brane and gas diffusion layer are analyzed. In Ref. [15], the 
dynamic performance of PEMFC is tested under various oper- 
ating conditions and load changes. 

Fig. 5 illustrates the effects of variation in temperature 
and HRout maintaining constant stoichiometric air relationships 
(à =2, 4, 8) applying the FC model. The stoichiometry (à) is the 
relationship between inlet air divided by the air necessary for 
the chemical reaction. 

To avoid the membrane-drying problem, some researches 
(e.g. [1,13]) have proposed extra humidification in the input 
reaction air. However, the variation in the HR of the input air 
produces a very small adjustment in the output HR; for exam- 
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Fig. 5. Temperature and relative humidity HRout for A = constant. 


100 


95 


90 


85 


HR output (%) 


80 
75 


70 
10 20 30 40 50 60 70 80 90 100 


HR input (%) 


Fig. 6. Variation of output HR by adjusting the input HR. 


ple, a variation of 10% in the input HR represents a variation of 
approximately 2% in the output HR. Thus, in many cases, the 
extra humidification of the input air is not enough to resolve the 
drying problem. Fig. 6 illustrates the variation produced in the 
HR output air by the adjustment in the HR input air. 


4. Faults in fuel cells 


In general, two categories of fault detection can be considered 
[16]: 


e Faults that can be detected by monitoring a specific variable. 
For example, fuel leaks can be detected by installing a specific 
gas sensor. In this case, a diagnosis is not necessary. 

e Faults that cannot be detected directly by monitoring or faults 
that need some type of diagnosis. 


In practice, fault detection on commercial FC equipment is lim- 
ited to detection of faults of the first type. This work focuses on 
fault detection of the second type. 

Four types of faults in PEMFCs are considered in this study: 
fault in the air blower, fault in the refrigeration system, growth 
of fuel crossover and internal loss current, and fault in hydrogen 
pressure. The effects of these faults and the behavior of the FTFC 
in fault operating conditions are included in the FC model [6]. 


4.1. Faults in the air-reaction blower 


A reduction of the reaction air by a fault in the air blower can 
produce two major effects: (i) accumulation of liquid water that 
cannot be evaporated, thus affecting the resistivity of electrodes, 
and (ii) reduction of O2 concentration below that necessary for 
a complete reaction with the Ho. 

A common method for removing excess water inside the FC 
is using the air flowing through it. The correct variation of the 
stoichiometry A maintains the HR proximal to 100%. However, 
when a fault in the air blower takes place, this becomes impos- 
sible. This fault reduces the air-reaction flow, which reduces 
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the water evaporation volume and permits the accumulation of 
water. A great accumulation of water causes the flooding of elec- 
trodes making gas diffusion difficult and affecting the resistance 
of the electrodes and the performance of the FC. This effect is 
simulated by the following equation [6]: 
Waccum(k) ae 

oe (9) 


Rep) = Re 
®) ©) ( const, 


where Rc) is the value of the variable at the initial state (normal 
condition), Waccum(k) the volume of water accumulated at instant 
k, and const; is a constant defining when the electrodes are led 
to flooding. 

The second effect of a fault in the air-reaction blower occurs 
when A is below the practical and recommended value. In this 
case, the O2 concentration is reduced and the exit air completely 
depleted of O2. This reduction of O2 concentration produces a 
negative effect on the ENernst (Eq. (2)) and the increment on the 
Vact (Eq. (3)). Fig. 7 illustrates the evolution of output voltage 
(Vs), electric current (Ifc), water volume accumulated, relative 
humidity (HRouyt), and stoichiometry, when a partial fault in the 
air blower occurs at t= 30 min. 


4.2. Fault in the refrigeration system 


The refrigeration system maintains temperature within oper- 
ating conditions. When the temperature increases, the reaction 
air has a drying effect and reduces the relative humidity (HR). 
A low HR can produce a catastrophic effect on the polymer 
electrolyte membrane, which not only totally relies upon high 
water content, but is also very thin (and thus prone to rapid dry- 
ing out). The drying of the membrane changes the membrane’s 


L.A.M. Riascos et al. / Journal of Power Sources 175 (2008) 419-429 


resistance to proton flow (Rm). Rm is affected by the adjustment 
of w, which varies according to the following equation [6]: 


Wo) 
(const /HRout))! 


Vi) = (10) 
where yo) is the value at the saturated condition (around 100% 
of HR), HRout(a is the relative humidity of the outlet air at instant 
k, and const2 is a constant defining when the membrane is led 
to drying. 

Fig. 8 illustrates the evolution of output voltage (Vs), electric 
current (Ipc), temperature, relative humidity (HRout), and stoi- 
chiometry produced by a fault in the refrigeration system (i.e. a 
reduction of Qrem2) at f= 30 min. 


4.3. Increase of fuel crossover (Jn) 


There is a small amount of wasted fuel that migrates through 
the membrane. It is defined as fuel crossover—some hydrogen 
will diffuse from the anode (through the electrolyte) to the cath- 
ode, reacting directly with the oxygen and producing no current 
for the FC. 

In normal conditions, the flow of fuel and electrons through 
the membrane (Jy) is very small, typically representing only a 
few mA cm”. A sudden increase in this parameter can be associ- 
ated with rupture of the membrane. This variation of Jn produces 
an increase in the concentration voltage drop (Vcon, Eq. (6)), and 
therefore a reduction of Vgc. Fig. 9 illustrates the evolution of 
output voltage (Vs), electric current (Ipc), generated heat (Qgen), 
real output power (Potreai), and stoichiometry produced by an 
increase in the fuel crossover (Jn) from 0.022 to 0.2 Acm? at 
t=30 min. 
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Fig. 7. Evolution of the FC model by air-reaction fault. 
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Fig. 8. Evolution of the FC model by refrigeration system fault. 


4.4. Fault in hydrogen pressure 


In general, for mobile and stationary applications, hydrogen 
is supplied by a high-pressure bottle, which is reduced by a 
pressure regulator. In normal conditions, the hydrogen pressure 
is assumed to be constant (generally between 1 and 3 atm). A 
lower pressure negatively affects the performance of the FC. 
The reduction of H? pressure decreases the ENernst, increases the 


Vact, and has a corresponding effect on Vgc. Fig. 10 illustrates 
the evolution of output voltage (Vs), electric current (Ipc), gen- 
erated heat (Qgen), stoichometry, and relative humidity (HRout) 
produced by a reduction of the H? pressure. 

In this section, the effects of four types of faults on the FC 
operation were explained simply and directly. However, when a 
fault occurs, an interconnected dependence among the variables 
is established; in general, all the variables perform some kind of 
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Fig. 9. Evolution of the FC model by membrane breaking. 
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Fig. 10. Evolution of the FC model by Hp pressure fault. 


changes. That hinders the diagnosis of the fault cause. To qualify 
and quantify the dependence among the variables, a Bayesian 
network is constructed to conduct the fault diagnosis. 


5. Bayesian networks for fault diagnosis 


Bayesian networks have been extensively applied to fault 
diagnosis, e.g. [17,18]; however, in the area of fuel cells, it is 
a new field. In Ref. [17], a Bayesian network is implemented 
for controlling an unsupervised fault tolerant system to gen- 
erate oxygen from the CO, on Mars atmosphere. In Ref. [18], 
Bayesian network is applied for fault diagnosis in a power deliv- 
ery system. One advantage of Bayesian network is that it allows 
the combination of expert knowledge of the process and prob- 
abilistic theory for the construction of a diagnostic procedure; 
nevertheless, both are recommended for the construction of a 
“good” Bayesian network. 

A Bayesian network is a structure that graphically models 
relationships of probabilistic dependence within a group of vari- 
ables. A Bayesian network B=(G, CP) is composed of the 
network structure G and the conditional probabilities (CP). A 
directed acyclic graph (DAG) represents the graphical structure 
G, where each node of the graph is associated to a variable 
Xi, and each node has a set of parents pa(X;). The relationship 
among variables and parents represents the cause-effect rela- 
tionship. The conditional probabilities, numerically quantifies 
this cause-effect relationship [19]. 

The construction of a Bayesian network for fault diagnosis 
begins with the collection of fault records and then probabilistic 
methods are applied for the generation of the cause-effect struc- 
ture. This process consists of the following steps, described in 
detail in Ref. [6]: 


(a) Construction of the database—the records are provided from 
the FC model implemented on MatLab®. Field experiments 
could also provide those records; however, two major prob- 
lems should be considered: (i) large amounts of data are 
necessary, where the generation of each case takes around 
2h of supervised experiments and (ii) variables such as 
Qgen, flooding, A, etc., impose additional challenges to be 
monitored. 

Implementation of search-and-score algorithms (e.g. the 
Bayesian-score (K2) [20] and MCMC [21]) to find the 
initial structure. The probabilistic approaches were imple- 
mented using the Bayesian Network Toolbox developed for 
MatLab® in Ref. [21]. 

Groups of variables are arranged in layers. Fault causes, 
sensors, and pattern recognition are considered as layers. 
Constraint-based conditions and knowledge are applied for 
improving the structure. 

Calculation of conditional probabilities on the final struc- 
ture. In this research, the maximum posteriori likelihood 
algorithm [22] was applied. 


(b) 


(c) 
(d) 
(e) 


5.1. Generation of the database 


Binary states of the variables are considered (0=normal, 
1=abnormal). The general procedure is to monitor a specific 
variable; if after a fault takes place and the value of such vari- 
able is off a certain tolerance band, then a flag should be turned 
to “1”. Fig. 11 represents the range of tolerance and the evolution 
of the Jpc after a fault at t= 30 min. 

The next step is the construction of a vector containing the 
value of all variables. This vector corresponds to a single case 
in the database with values of all variables considered in a cer- 
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Fig. 11. Evolution of [pc by a fault at t= 30 min. 


tain period. A database of fault records with 10,000 cases was 
constructed for the structure learning of a Bayesian network for 
fault diagnosis in fuel cells. The database considers different 
operational conditions with different fault causes simulated and 
selected in a random sequence. 

From the mathematical model, the evolution of variables that 
can be difficult to monitor on a real machine (such as Qgen or 
à) can be observed. Records of all variables are essential for the 
construction of the network structure avoiding hidden variables. 
The calculation of the diagnosis is simpler if there are no hidden 
variables [23]. 

The variables considered are the following: 


Ja = fault by fuel crossover 

aF = fault in the air blower 

rF = fault in the refrigeration system 
H2 = fault by low H pressure 

Fl = volume of air flow 

Qgen = generated heat 

à = stoichiometric air relationship 
HR = output relative humidity 
Dr=drying of membrane 

Fd = flooding of electrodes 

Ov = overload 

V=voltagestack 

Ipc = electrical current of the FC 
T= temperature 

Pow = difference between real output power and required load 
Py, = H2 pressure 


5.2. Search-and-score algorithms 


The Bayesian-score (K2) and the Markov Chain Monte Carlo 
(MCMC) algorithms were implemented in separated ways. The 
K2 algorithm adds parents to a single node the addition of which 


most increases the score of the resulting structure. When the 
addition of no single parent increases the score, it stops adding 
parents to a node and go to the next node. Before the algorithm 
begins, the possible parents of every variable must be defined. 
Therefore, the human-expert experience is important to define 
that order. 

The MCMC algorithm starts at a specific point in the space 
of all possible DAGs. The search is performed through all the 
nearest neighbors, and it moves to the neighbor that has the 
highest score. If no neighbor has a higher score than the current 
point, a local maximum was reached, and the algorithm stops. 
A neighbor is the graph that can be generated from the current 
graph by adding, deleting or reversing a single arc. 

In practice, the search-and-score algorithms are not exact, and 
used only as initial approximations, also since the Bayesian- 
score and MCMC algorithms applied different tradeoffs for 
searching the structure, those algorithms can produce different 
results; therefore, knowledge about the conditional indepen- 
dence among the variables should be applied for obtaining a 
resulting graph. 


5.3. Layers of the Bayesian network 


For a better understanding of the relationship among vari- 
ables, those are separated in several layers. In the final structure, 
three layers are considered: fault causes, sensors, and pattern 
recognition. Fault causes are the possible causes of the fault 
such as fault in the air fan (aF), fault in the refrigeration system 
(rF), growth of Jn, and low H3 pressure. Sensors are the variables 
that can be easily monitored by sensors such as output voltage 
(Vs), electric current (Ipc), power, temperature, and H? pressure 
(Pu, ). Pattern recognition is associated with variables difficult 
to monitor in a real machine, but that play an important role in 
a cause-effect structure and define a fault pattern. Those vari- 
ables are: generated heat (Qgen), stoichiometric air relationship 
(A), volume of air flow (Fl), drying of membrane (Dr), flooding 
of electrodes (Fd), overload (Ov) (i.e. the FC is working close 
to the maximum load—in those cases, some variables perform a 
different evolution), and relative humidity (HRout) (remember, 
HRout can only be measured between 0% and 100%). 


5.4. Constrain-based conditions and knowledge 


First, the fusion of the results applying several probabilistic 
algorithms confirms the edges present in different structures, 
second, the remaining edges are submitted to erasing based on 
constrains and domain knowledge. This process is described in 
detail in Ref. [6]. 

Some of the considered constraints are: (1) independent fault 
cause assumption, i.e. only one fault takes place each time, and 
one fault cause does not influence other fault cause; (ii) inde- 
pendent sensors—edges among sensors can be erased because 
their values are always observed. 

Fig. 12 illustrates the resulting Bayesian network structure for 
fault diagnosis in PEMFC. The conditional probabilities (CP) are 
obtained by the maximum posteriori likelihood algorithm [23] 
on the network structure considered in Fig. 12. The Bayesian 
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Fig. 12. Bayesian network structure for fault diagnosis in a PEMFC. 


network B composed of the network structure G plus CP is ready 
to be used for fault diagnosis in a PEMFC. 

Network structures representing a diagnostic process play 
a fundamental role for fault tolerant machines since they can 
be associated with fault treatment processes (i.e. performing 
the fault diagnosis to identify the fault cause and executing 
the automatic recovery process). In Refs. [24—26] fault detec- 
tion and fault treatment were integrated; the case studies were 
automatic recovery processes in electric autonomous guided 
vehicles, machining processes and factories. 


6. The on-line fault diagnosis 


An inference is the computation of a probability p(Xq|Xz), 
where Xq is the variable of interest (e.g. the most probable fault 
cause) and Xg is the variable or set of variables that have been 
observed (i.e. the effects observed by sensors and transformed 
into logic outputs). 

There are many different algorithms for calculating the infer- 
ence in Bayesian networks, which apply different tradeoffs 
between speed, complexity, generality, and accuracy. The on- 
line fault supervisor executes the fault diagnostic inference 
by applying the variable elimination algorithm, which can be 
applied to any type of Bayesian network structure [27]. 

The fault diagnostic fuel cell system is composed of several 
subsystems: the fuel cell stack and controller, the supervisor, 
and the peripheryc subsystems. The fuel cell stack contains 
the electro-chemical and thermo-dynamical parts of the model 
which calculate voltage, temperature, and humidity. The con- 
troller calculates the volume of air-reaction and turns on/off 
the refrigeration subsystem according to the performance of 
the process. The supervisor verifies the correct operation of the 
FC. If monitored variables perform abnormal changes, then the 
supervisor executes the diagnostic process. The peripheryc sub- 
systems provide the air for the chemical reaction, the hydrogen, 
the refrigeration, and the load. The environmental conditions are 
temperature 25 °C and relative humidity 40%. 

Fig. 13 illustrates the execution of an on-line fault diagno- 
sis. This test was performed forcing externally the output of 
the refrigeration subsystem to zero (this simulates a fault condi- 
tion). In this case, the supervisor detects abnormal variations in 
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Environment 


Fig. 13. On-line fault diagnosis execution. 


some variables during the operation of the FC. Then, the condi- 
tional probabilities were calculated for all fault causes (Jn, aF, 
rF, and H2) and shown at the supervisor’s display. According to 
the supervisor, the most probable fault cause is rF (fault in the 
refrigeration system) with 94% probability. The second proba- 
ble cause is an increase of Jy with 44% probability. And causes 
aF and H2 have 0% probability. 

In all tests performed, the supervisor always indicated the 
true cause as the most probable cause. 

In general, the variation of electrical variables (such as 
output voltage (Vs), and electric current (/fc)) is faster than 
the variation of thermo-dynamical variables (e.g. temperature). 
Therefore, the diagnosis of faults such as rF takes more time (in 
this case, around 20s); actually, this speed depends entirely on 
the accuracy of the sensors. According to our experience, a worse 
case scenario still allows fault detection in less than 1 min. But, 
even | min is a good speed for detecting incipient faults before 
a catastrophic effect takes place in the fuel cell system. 


7. Conclusions 


The design of a supervisor system to perform on-line fault 
diagnosis in PEM fuel cells was implemented. The execution of 
the diagnosis was based on a Bayesian network, which qualifies 
and quantifies the cause-effect relationship within the variables. 

Fault records of some variables were constructed including 
variables difficult to monitor in a real machine. The record of all 
relevant variables is essential for the construction of the network 
structure avoiding hidden variables, especially in intermediary 
layers. 

After the construction of the Bayesian network, the inference 
calculation is based on observations of variables easy to monitor 
with sensors such as voltage, electric current, temperature, etc. 
This allows the implementation of fault diagnostic processes in 
a real machine. 

The fault diagnostic tests have shown agreement between the 
inference results and the original fault causes. 
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In general, the fault diagnostic tests were fast enough to detect 
incipient faults before a catastrophic effect took place in the fuel 
cell system. 

Topics such as the study of fault effects in fuel cells, the 
construction of network structures for fault diagnosis in fuel 
cells, and their association to fault treatment processes are still 
under study, and are still open to research contributions. 
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